46 research outputs found
Affect Recognition in Conversations Using Large Language Models
Affect recognition, encompassing emotions, moods, and feelings, plays a
pivotal role in human communication. In the realm of conversational artificial
intelligence (AI), the ability to discern and respond to human affective cues
is a critical factor for creating engaging and empathetic interactions. This
study delves into the capacity of large language models (LLMs) to recognise
human affect in conversations, with a focus on both open-domain chit-chat
dialogues and task-oriented dialogues. Leveraging three diverse datasets,
namely IEMOCAP, EmoWOZ, and DAIC-WOZ, covering a spectrum of dialogues from
casual conversations to clinical interviews, we evaluated and compared LLMs'
performance in affect recognition. Our investigation explores the zero-shot and
few-shot capabilities of LLMs through in-context learning (ICL) as well as
their model capacities through task-specific fine-tuning. Additionally, this
study takes into account the potential impact of automatic speech recognition
(ASR) errors on LLM predictions. With this work, we aim to shed light on the
extent to which LLMs can replicate human-like affect recognition capabilities
in conversations
Speech-based Slot Filling using Large Language Models
Recently, advancements in large language models (LLMs) have shown an
unprecedented ability across various language tasks. This paper investigates
the potential application of LLMs to slot filling with noisy ASR
transcriptions, via both in-context learning and task-specific fine-tuning.
Dedicated prompt designs and fine-tuning approaches are proposed to improve the
robustness of LLMs for slot filling with noisy ASR transcriptions. Moreover, a
linearised knowledge injection (LKI) scheme is also proposed to integrate
dynamic external knowledge into LLMs. Experiments were performed on SLURP to
quantify the performance of LLMs, including GPT-3.5-turbo, GPT-4, LLaMA-13B and
Vicuna-13B (v1.1 and v1.5) with different ASR error rates. The use of the
proposed fine-tuning together with the LKI scheme for LLaMA-13B achieved an
8.3% absolute SLU-F1 improvement compared to the strong Flan-T5-base baseline
system on a limited data setup
CAMELL: Confidence-based Acquisition Model for Efficient Self-supervised Active Learning with Label Validation
Supervised neural approaches are hindered by their dependence on large,
meticulously annotated datasets, a requirement that is particularly cumbersome
for sequential tasks. The quality of annotations tends to deteriorate with the
transition from expert-based to crowd-sourced labelling. To address these
challenges, we present \textbf{CAMELL} (Confidence-based Acquisition Model for
Efficient self-supervised active Learning with Label validation), a pool-based
active learning framework tailored for sequential multi-output problems. CAMELL
possesses three core features: (1) it requires expert annotators to label only
a fraction of a chosen sequence, (2) it facilitates self-supervision for the
remainder of the sequence, and (3) it employs a label validation mechanism to
prevent erroneous labels from contaminating the dataset and harming model
performance. We evaluate CAMELL on sequential tasks, with a special emphasis on
dialogue belief tracking, a task plagued by the constraints of limited and
noisy datasets. Our experiments demonstrate that CAMELL outperforms the
baselines in terms of efficiency. Furthermore, the data corrections suggested
by our method contribute to an overall improvement in the quality of the
resulting datasets
ChatGPT for Zero-shot Dialogue State Tracking: A Solution or an Opportunity?
Recent research on dialogue state tracking (DST) focuses on methods that
allow few- and zero-shot transfer to new domains or schemas. However,
performance gains heavily depend on aggressive data augmentation and
fine-tuning of ever larger language model based architectures. In contrast,
general purpose language models, trained on large amounts of diverse data, hold
the promise of solving any kind of task without task-specific training. We
present preliminary experimental results on the ChatGPT research preview,
showing that ChatGPT achieves state-of-the-art performance in zero-shot DST.
Despite our findings, we argue that properties inherent to general purpose
models limit their ability to replace specialized systems. We further theorize
that the in-context learning capabilities of such models will likely become
powerful tools to support the development of dedicated and dynamic dialogue
state trackers.Comment: 13 pages, 3 figures, accepted at ACL 202
From Chatter to Matter: Addressing Critical Steps of Emotion Recognition Learning in Task-oriented Dialogue
Emotion recognition in conversations (ERC) is a crucial task for building
human-like conversational agents. While substantial efforts have been devoted
to ERC for chit-chat dialogues, the task-oriented counterpart is largely left
unattended. Directly applying chit-chat ERC models to task-oriented dialogues
(ToDs) results in suboptimal performance as these models overlook key features
such as the correlation between emotions and task completion in ToDs. In this
paper, we propose a framework that turns a chit-chat ERC model into a
task-oriented one, addressing three critical aspects: data, features and
objective. First, we devise two ways of augmenting rare emotions to improve ERC
performance. Second, we use dialogue states as auxiliary features to
incorporate key information from the goal of the user. Lastly, we leverage a
multi-aspect emotion definition in ToDs to devise a multi-task learning
objective and a novel emotion-distance weighted loss function. Our framework
yields significant improvements for a range of chit-chat ERC models on EmoWOZ,
a large-scale dataset for user emotion in ToDs. We further investigate the
generalisability of the best resulting model to predict user satisfaction in
different ToD datasets. A comparison with supervised baselines shows a strong
zero-shot capability, highlighting the potential usage of our framework in
wider scenarios.Comment: Accepted by SIGDIAL 202
EmoUS: Simulating User Emotions in Task-Oriented Dialogues
Existing user simulators (USs) for task-oriented dialogue systems only model
user behaviour on semantic and natural language levels without considering the
user persona and emotions. Optimising dialogue systems with generic user
policies, which cannot model diverse user behaviour driven by different
emotional states, may result in a high drop-off rate when deployed in the real
world. Thus, we present EmoUS, a user simulator that learns to simulate user
emotions alongside user behaviour. EmoUS generates user emotions, semantic
actions, and natural language responses based on the user goal, the dialogue
history, and the user persona. By analysing what kind of system behaviour
elicits what kind of user emotions, we show that EmoUS can be used as a probe
to evaluate a variety of dialogue systems and in particular their effect on the
user's emotional state. Developing such methods is important in the age of
large language model chat-bots and rising ethical concerns.Comment: accepted by SIGIR202
The complete mitochondrial genomes of Diplonevra funebris and Diplonevra peregrina (Diptera: Phoridae)
Diplonevra is one of the most important genera in the family Phoridae. This genus is mainly distributed in Palearctic region, and its species can be used to estimate the postmortem interval. In this study, we first present two mitochondrial genomes of common necrophagous species of this genus, Diplonevra funebris (Meigen, 1830) and Diplonevra peregrina (Wiedemann, 1830). Maximum-likelihood phylogenetic tree revealed that the genus Diplonevra is closely related to the genus Dohrniphora within the family Phoridae. This work expands the knowledge about the Phoridae genomes, and contributes to the further study of species identification and phylogenetics of this family